minimum point
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Pennsylvania (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Simultaneous Optimization of Geodesics and Fréchet Means
Rygaard, Frederik Möbius, Hauberg, Søren, Markvorsen, Steen
A central part of geometric statistics is to compute the Fréchet mean. This is a well-known intrinsic mean on a Riemannian manifold that minimizes the sum of squared Riemannian distances from the mean point to all other data points. The Fréchet mean is simple to define and generalizes the Euclidean mean, but for most manifolds even minimizing the Riemannian distance involves solving an optimization problem. Therefore, numerical computations of the Fréchet mean require solving an embedded optimization problem in each iteration. We introduce the GEORCE-FM algorithm to simultaneously compute the Fréchet mean and Riemannian distances in each iteration in a local chart, making it faster than previous methods. We extend the algorithm to Finsler manifolds and introduce an adaptive extension such that GEORCE-FM scales to a large number of data points. Theoretically, we show that GEORCE-FM has global convergence and local quadratic convergence and prove that the adaptive extension converges in expectation to the Fréchet mean. We further empirically demonstrate that GEORCE-FM outperforms existing baseline methods to estimate the Fréchet mean in terms of both accuracy and runtime.
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- (3 more...)
De-singularity Subgradient for the $q$-th-Powered $\ell_p$-Norm Weber Location Problem
Lai, Zhao-Rong, Wu, Xiaotian, Fang, Liangda, Chen, Ziliang, Li, Cheng
The Weber location problem is widely used in several artificial intelligence scenarios. However, the gradient of the objective does not exist at a considerable set of singular points. Recently, a de-singularity subgradient method has been proposed to fix this problem, but it can only handle the $q$-th-powered $\ell_2$-norm case ($1\leqslant q<2$), which has only finite singular points. In this paper, we further establish the de-singularity subgradient for the $q$-th-powered $\ell_p$-norm case with $1\leqslant q\leqslant p$ and $1\leqslant p<2$, which includes all the rest unsolved situations in this problem. This is a challenging task because the singular set is a continuum. The geometry of the objective function is also complicated so that the characterizations of the subgradients, minimum and descent direction are very difficult. We develop a $q$-th-powered $\ell_p$-norm Weiszfeld Algorithm without Singularity ($q$P$p$NWAWS) for this problem, which ensures convergence and the descent property of the objective function. Extensive experiments on six real-world data sets demonstrate that $q$P$p$NWAWS successfully solves the singularity problem and achieves a linear computational convergence rate in practical scenarios.
- North America > United States (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- Asia > Macao (0.04)
- (3 more...)
Local Complexity of Stochastic Convex Optimization
We extend the traditional worst-case, minimax analysis of stochastic convex optimization by introducing a localized form of minimax complexity for individual functions. Our main result gives function-specific lower and upper bounds on the number of stochastic subgradient evaluations needed to optimize either the function or its "hardest local alternative" to a given numerical precision. The bounds are expressed in terms of a localized and computational analogue of the modulus of continuity that is central to statistical minimax analysis. We show how the computational modulus of continuity can be explicitly calculated in concrete cases, and relates to the curvature of the function at the optimum. We also prove a superefficiency result that demonstrates it is a meaningful benchmark, acting as a computational analogue of the Fisher information in statistical estimation. The nature and practical implications of the results are demonstrated in simulations.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Pennsylvania (0.04)
- North America > United States > New York (0.04)
- (2 more...)
The Novel Adaptive Fractional Order Gradient Decent Algorithms Design via Robust Control
Liu, Jiaxu, Chen, Song, Cai, Shengze, Xu, Chao
The vanilla fractional order gradient descent may oscillatively converge to a region around the global minimum instead of converging to the exact minimum point, or even diverge, in the case where the objective function is strongly convex. To address this problem, a novel adaptive fractional order gradient descent (AFOGD) method and a novel adaptive fractional order accelerated gradient descent (AFOAGD) method are proposed in this paper. Inspired by the quadratic constraints and Lyapunov stability analysis from robust control theory, we establish a linear matrix inequality to analyse the convergence of our proposed algorithms. We prove that the proposed algorithms can achieve R-linear convergence when the objective function is $\textbf{L-}$smooth and $\textbf{m-}$strongly-convex. Several numerical simulations are demonstrated to verify the effectiveness and superiority of our proposed algorithms.
- Asia > China > Zhejiang Province > Hangzhou (0.05)
- North America > United States > New York (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (2 more...)
Algorithmic Trading Models - Machine Learning
In the fifth article of this series, we will continue to summarise a collection of commonly used technical analysis trading models that will steadily increase in mathematical and computational complexity. Typically, these models are likely to be most effective around fluctuating or periodic instruments, such as forex pairs or commodities, which is what I have backtested them on. The aim behind each of these models is that they should be objective and systematic i.e. we should be able to translate them into a trading bot that will check some conditions at the start of each time period and make a decision if a buy or sell order should be posted or whether an already open trade should be closed. Please note that not all of these trading models are successful. In fact, a large number of them were unsuccessful.
Early Stopping Explained!
Early stopping is one of the effective and simplest regularization techniques used in training neural networks. Usually, during training, the training loss will decrease gradually, and if everything goes well on the validation side, validation loss will decrease too. When the validation loss hits the local minimum point, it will start to increase again. Which is a signal of overfitting. How can we stop the training just right before the validation loss rise again? Or before the validation accuracy starts decreasing?
A Gentle Introduction to Particle Swarm Optimization
Particle swarm optimization (PSO) is one of the bio-inspired algorithms and it is a simple one to search for an optimal solution in the solution space. It is different from other optimization algorithms in such a way that only the objective function is needed and it is not dependent on the gradient or any differential form of the objective. It also has very few hyperparameters. In this tutorial, you will learn the rationale of PSO and its algorithm with an example. Particle Swarm Optimization was proposed by Kennedy and Eberhart in 1995.
Gradient Descent
Understanding the concept of the gradient is useful for understanding the logic of the gradient descent algorithm. Let's take a look at the explanation of the concept of stationary point in Wikipedia. As it can be understood from here, the gradient descent algorithm takes the points in the cost function and continues with the aim of reducing the derivative (slope) of these points in each iteration. The reason for this is to find the value whose slope is zero, in other words, the minimum point. When the coordinate values of this point are substituted in the hypothesis function, the function we obtain becomes the hypothesis function of the model with the least error we can create.